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Abstract 

The problem of guessing a random string is revisited. A close relation between guessing and compression is 
first established. Then it is shown that if the sequence of distributions of the information spectrum satisfies the large 
deviation property with a certain rate function, then the limiting guessing exponent exists and is a scalar multiple of 
the Legendre-Fenchel dual of the rate function. Other sufficient conditions related to certain continuity properties 
of the information spectrum are briefly discussed. This approach highlights the importance of the information 
■ spectrum in determining the limiting guessing exponent. All known prior results are then re-derived as example 

\ applications of our unifying approach. 

^ , Index Terms 

<^ 

^ ■ guessing, length function, source coding, information spectrum, large deviations. 

I. INTRODUCTION 

Let X" = (Xi, ■ ■ ■ ,X„) denote n letters of a process where each letter is drawn from a finite set X 
^ with joint probability mass function (pmf) (P„(a;") : x" G X"). Let a;" be a realization and suppose that 
O we wish to guess this realization by asking questions of the form "Is X" = s"?", stepping through the 
elements of X" until the answer is "Yes". We wish to do this using the minimum expected number of 
guesses. There are several applications that motivate this problem. Consider cipher systems employed in 
digital television or DVDs to block unauthorized access to special features. The ciphers used are amenable 
to such exhaustive guessing attacks and it is of interest to quantify the effort needed by an attacker (Merhav 
0\ _ & Arikan UJ). 

Massey |l2l observed that the expected number of guesses is minimized by guessing in the decreasing 
1^ . order of -probabilities. Define the guessing function 

^' G::X"^{1,2,--- ,|xr} 

to be one such optimal guessing ordeiQ. = g implies that is the gth guess. Arikan [[3l considered 

X ■ the growth of E [G:(X")^] as a function of n for an independent and identically distributed (iid) source 
. with marginal pmf Pi and p > 0. He showed that the growth is exponential in n; the limiting exponent 

E{p) := lim -lnE[G':(X")^] (1) 

n— >-oo n 

exists and equals pHa{Pi) with a = 1/(1 + p), where Ha(Pn) is the Renyi entropy of order a for the 
pmf Pn, given by 

^ln( J2 Pni^'^r] , a^l. (2) 



> 
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'if there are several sequences with the same probability of occurrence, they may be guessed in any order without affecting the expected 
number of guesses. 



2 



Malone & Sullivan ||4l showed that the limiting exponent E{p) of an irreducible Markov chain exists and 
equals the logarithm of the Perron-Frobenius eigenvalue of a matrix formed by raising each element of 
the transition probability matrix to the power a. From their proof, one obtains the more general result 
that the limiting exponent exists for any source if the Renyi entropy rate of order a, 

lim n~'H^iPn), (3) 

exists for a = 1/(1 + p). Pfister & Sullivan Q showed the existence of ([U) for a class of stationary 
probability measures, beyond Markov measures, that are supported on proper subshifts of [|5J. A 
particular example is that of shifts generated by finite-state machines. For such a class, they showed that 
the guessing exponent has a variational characterization (see (|25] ) later). For unifilar sources Sundaresan 
||6l obtained a simplification of this variational characterization using a direct approach and the method 
of types. 

Merhav & Arikan remark that their proof in [7] for the limiting guessing exponent is equally applicable 
to finding the limiting exponent of the moment generating function of compression lengths. Moreover, 
the two exponents are the same. The latter is a problem studied by Campbell [[8]|. 

Our contribution is to give a large deviations perspective to these results, shed further light on the 
aforementioned connection between compression and guessing, and unify all prior results on existence of 
limiting guessing exponents. Specifically, we show that if the sequence of distributions of the information 
spectrum (1/n) ln(l/P„(X")) (see Han [9]) satisfies the large deviation property, then the limiting 
exponent exists. This is useful because several existing large deviations results can be readily applied. We 
then show that all but one previously considered cases in the literature!! satisfy this sufficient condition. 
See Examples [T115] in section JVl 

The large deviation theoretic ideas are already present in the works of Pfister & Sullivan (51 and the 
method of types approach of Arikan & Merhav d. Our work however brings out the essential ingredient 
(the sufficient conditions on the information spectrum), and enables us to see the previously obtained 
specific results under one light. 

The quest for a general sufficient condition under which the information spectrum satisfies a large 
deviation property is a natural line of inquiry, and one of independent interest, in view of the Shannon- 
McMillan-Breiman theorem which asserts that the information spectrum of a stationary and ergodic source 
converges to the Shannon entropy almost surely and in Lq, for all g > 1; see for example [11 J. In 
particular, the large deviation property implies exponentially fast convergence to entropy. In the several 
specific examples we consider, the information spectrum does satisfy the large deviation property. One 
sufficient condition for the weaker property of exponentially fast convergence to entropy is the so-called 
blowing up property. (See Marton & Shields [|l2i Th. 2], or the survey article by Shields [13]). One 
family of sources, that includes most of the sources we consider in this paper and goes beyond, is that 
of finitary encodings of memoryless processes, also called finitary processes. These are known to have 
the blowing-up property, and therefore exponentially fast convergence to entropy (see Marton & Shields 
[[T2l Th. 3]). It is an interesting open question to see if finitary processes, or what other sources with the 
blowing up property, satisfy the large deviation property. 

The rest of the paper is organized as follows. Section II studies the tight relationship between guessing 
and compression. Section III states the relevant large deviations results and the main sufficiency results. 
Section IV re-derives prior results by showing that in each case the information spectrum satisfies the 
LDP. Section V contains proofs and section VI contains some concluding remarks. 

II. Guessing and Compression 

In this section we relate the problem of guessing to one of source compression. An interesting conclusion 
is that robust source compression strategies lead to robust guessing strategies. 

^These are cases without side information and key-rate constraints. The one exception is an example of Arikan & Merhav Sec. VI-B] 
for which one can show the existence of Renyi entropy rate rather directly via a subadditivity argument. See our technical report 1101 . 
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For ease of exposition, let us assume that the message space is simply X. The extension to strings of 
length n is straightforward and will be returned to shortly. A guessing function 

G : X-^ {1,2,--- ,|X|} 

is a bijection that denotes the order in which the elements of X are guessed. If G{x) = g, then the gth 
guess is X. Let N denote the set of natural numbers. A length function 

L : X ^ N 

is one that satisfies Kraft's inequality 

5^exp2{-L(x)}<l, (4) 

where we have used the notation exp2 {-L{x)} = 2~^(^). To each guessing function G, we associate a 
PMF Qg on X and a length function Lg as follows. 

Definition 1: Given a guessing function G, we say Qg defined by 

Q^{x) = c-^ ■ G{x)-\ Vx G X, (5) 

is the PMF on X associated with G. The quantity c in ([5]) is the normalization constant. We say Lg 
defined by 

Lg{x) = \-\og^QG{x)], VxgX, (6) 



is the length function associated with G. 
Observe that 



1^1 , 

c=J2G{a)-^ = ^ - < 1 + ln 



aGX i=l 



(7) 



and therefore the PMF in ([5]) is well-defined. We record the intimate relationship between these associated 
quantities in the following result. (This is also available in the proof of [14, Th. 1, p. 382]). 
Proposition 1: Given a guessing function G, the associated quantities satisfy 

• Qg{x)-' = G{x) < QGix)"\ (8) 
Lg{x) - 1 - log2 c < log2 G{x) < Lg{x). (9) 

Proof: The first equality in ([8]) follows from the definition in ([5]), and the second inequality from the 
fact that c > 1. 

The upper bound in Q follows from the upper bound in ([8]) and from ([6]). The lower bound in Q 
follows from 

log2G'(x) = \og^{c-^ -QGix)-^) 

= -\0g2QGix) -\0g2c 

> (r-log2QG(a;)l - 1) - log2C 

= LG(a;) - 1 - log2C. 

■ 

We now associate a guessing function Gl to each length function L. 

Definition 2: Given a length function L, we define the associated guessing function to be the one 
that guesses in the increasing order of L-lengths. Messages with the same L-length are ordered using an 
arbitrary fixed rule, say the lexicographical order on X. We also define the associated PMF Ql on X to 
be 

Ql{x) = / (10) 
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Proposition 2: For a length function L, the associated PMF and the guessing function satisfy the 
following: 

1) Gl guesses messages in the decreasing order of -probabilities; 
2) 

log2Gi(x) < \og^QL{x)-^ < L{x). (11) 

Proof: The first statement is clear from the definition of and from (flOl) . 
Letting 1{E} denote the indicator function of an event E, we have as a consequence of statement 1) 
that 



Gl{x) < Y.^{QL{a)>QL{x)} 



< 



Ql{x) 



aGX 

= QL{xr\ (12) 

which proves the left inequality in (fTTI) . This inequality was known to Wyner ifTSl . 
The last inequality in (fTTI) follows from (fTOl) and Kraft's inequality (H]) as follows: 

Qdxy^ = exp2{L{x)} ■ ^exp2{-L(a)} < exp^iHx)}. 

aGX 

■ 

Let {L{x) > B} denote the set {a; G X | L{x) > B}. We then have the following easy to verify 
corollary to Propositions [U and |2l 

Corollary 3: For a given G, its associated length function Lq, and any i? > 1, we have 

>5 + l + log2c} 

C{LGix)>B}. (13) 
Analogously, for a given L, its associated guessing function Gl, and any > 1, we have 

{Gl{x) > exp^iB}} C {L{x) > B}. (14) 

The inequalities between the associates in Q and (fTT)) indicate the direct relationship between guessing 
moments and Campbell's coding problem [8|, and that the Renyi entropies are the optimal growth 
exponents for guessing moments, as highlighted in the following Proposition. 

Proposition 4: Let L be any length function on X, the guessing function associated with L, P a 
PMF on X, p G (0, oo), L* the length function that minimizes E [exp2{pL* (X)}], where the expectation 
is with respect to P, G* the guessing function that proceeds in the decreasing order of P-probabilities 
and therefore the one that minimizes E [G*{X)^], and c as in (|7]). Then 

E[Gr.{Xy] ^ E[exp2{pL(X)}] 

- E[exp2{pL-(X)}] ■ + ^'^^ 

Analogously, let G be any guessing function, and Lg its associated length function. Then 

Also, 

^ log2 E [G*{Xy] - ^ log2 E [exp2{pL*(X)}] < 1 + logs c (17) 
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Proof: Observe that 

E[exp2{pL(X)}] 

> ^[Gl{XY] (18) 

> ¥.[G*{XY] 

> E[exp2{pLG.(X)}]exp2{-p(l + log2c)} (19) 

> E [exp2{pL*(X)}] exp2{-p(l + log^ c)}, (20) 

where (fTSi ) follows from (fTTI) . and (fT9l) from the left inequality in ([91). The result in (fT5l ) immediately 
follows. A similar argument shows (fT6l) . Finally, (fTTI) follows from the inequalities leading to (|20|) by 
setting L = L*. ■ 
Thus if we have a length function whose performance is close to optimal, then its associated guessing 
function is close to guessing optimal. The converse is true as well. Moreover, the optimal guessing 
exponent is within 1 + logj c of the optimal coding exponent for the length function. 



A. Strings of length n 

Let us now consider strings of length n. Let X" denote the set of messages and consider n — )■ oo. Let 
A1(X") denote the set of pmfs on X". By a source, we mean a sequence of pmfs (_P„ : n E N), where 
Pn € A^(X"). We replace the normalization constant c in (7) by c„ and observe that 

c„ < 1 + nln |X|. 

If we normalize both sides of equation (fTTI ) by n, the difference between two quantities as a function of n 
decays as 0((log2 n)/n), and vanishes as n tends to infinity. The following theorem follows immediately, 
with a change of base to natural logarithms. 
Theorem 5: Given p > 0, the limit 

lim n-MnE[G;(X")^] 

n— >oo 

exists if and only if the limit 

lim inf n-MnE[exp2{pL„(X")}] 

exists. Furthermore, the two limits are equal. 

It is therefore sufficient to restrict our attention to the Campbell's coding problem [8J and study if the 
limit 

lim inf-lnE[exp{(pln2)L„(X")}] (21) 

n->oo Ln n 

exists, where the infimum is taken over all length functions L„ : X" — )• N and exponentiation is with 
respect to the base of the natural logarithm. 



B. Universality 

Before we proceed to studying the limit, we make a further remark on the connection between universal 
strategies for guessing and universal strategies for compression. 

Let T denote a class of sources. For each source in the class, let P„ be its restriction to strings of length 
n and let L* denote an optimal length function that attains the minimum value E [exp{(pln2)L*(X")}] 
among all length functions, the expectation being with respect to Pn- On the other hand, let L„ be a 
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sequence of length functions for the class of sources that does not depend on the actual source within the 
class. Suppose further that the length sequence L„ is asymptotically optimal, i.e., 

lim — lnE[exp{(pln2)L„(X")}] 

rn-oo np 

= lim — lnE[exp{(pln2)L;(X")}] , 

n-5-oo np 

for every source belonging to the class. L„ is thus "univeral" for (i.e., asymptotically optimal for all sources 
in) the class. An application of (fTSi) with c„ in place of c followed by the observation (1 + log2 c„)/n — t- 
shows that the sequence of guessing strategies Gl„ is asymptotically optimal for the class, i.e., 

lim — lnE[Gi„(X")^] 

n^oo np 

= lim — lnE[G*(X")'']. 

n-¥oo np 

Arikan and Merhav [7J provide a universal guessing strategy for the class of discrete memoryless sources 
(DMS). For the class of unifilar sources with a known number of states, the minimum description length 
encoding is asymptotically optimal for Campbell's coding length problem (see Merhav [fT6ll ). It follows 
as a consequence of the above argument that guessing in the increasing order of description lengths 
is asymptotically optimal. The left side of (fTSl ) is the extra factor in the expected number of guesses 
(relative to the optimal value) due to lack of knowledge of the specific source in class. Sundaresan LlTl 
characterized this loss as a function of the uncertainty class. 



III. Large Deviation Results 

We begin with some words on notation. Recall that A^(X") denotes the set of pmfs on X". The Shannon 
entropy for a P„ G A^(X") is 

H{Pn) = - J2 Pni^n^^Pnixn 

and the Renyi entropy of order a ^ lis The KuUback-Leibler divergence or relative entropy between 
two pmfs Qn and Pn is 

[5^Q.(x")ln|^^, if Q„,«P„, 

[ oo, otherwise, 

where Qn ^ Pn means Q„ is absolutely continuous with respect to P„. Recall that a source is a sequence 
of pmfs (Pn : n G N) where Pn G A^(X"). It is usually obtained via n-length marginals of some 
probability measure in A^(X^). Also recall the definitions of limiting guessing exponent in ([T]) and Renyi 
entropy rate in ([3]) when the limits exist. G* is an optimal guessing function for a pmf P„ G A^(X"). 
From the results in Section HI] on the equivalence between guessing and compression, it is sufficient to 
focus on the Campbell coding problem. 

Our first contribution is a proof of the following implicit result of Malone & Sullivan BU. The proof 
is given in Section IV-AI 

Proposition 6: Let p > 0. For a source (P„ : n G N), E{p) exists if and only if the Renyi entropy rate 
([3]) exists. Furthermore, E{p)/p equals the Renyi entropy rate. 

The question now boils down to the existence of the limit in the definition of Renyi entropy rate. The 
theory of large deviations immediately yields a sufficient condition. We begin with a definition. 
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Definition 3 (Large deviation property): IfTSl Def. 11.3.1] A sequence (z/„ : n G N) of probability 
measures on M satisfies the large deviation property (LDP) with rate function / : M — t- [0, oo] if the 
following conditions hold: 

• / is lower semicontinuous on M; 

• / has compact level sets; 

• lim sup„_^oo n^^ In UniK} < — inf^gi^ I{t) for each closed subset K of M; 

• liminf„_^oo In UniG} > — inf^gc H'^) each open set G of M. 

Several commonly encountered sources satisfy the LDP with known and well-studied rate functions. 
We describe some of these in the examples treated subsequently. 

Let Un denote the distribution of the information spectrum given by the real-valued random variable 
—n~^ lnP„(X"). The following proposition gives a sufficient condition for the existence of the limiting 
Renyi entropy rate (and therefore the limiting guessing exponent). 

Proposition 7: Let the sequence of distributions (z/„ : n E N) of the information spectrum satisfy the 
LDP with rate function /. Then the limiting Renyi entropy rate of order 1/(1 + p) exists for all p > 
and equals 

p-^sup {I3t-I{t)}, 

where /3 = p/(l + p). Consequently, the limiting guessing exponent exists and equals 

(l + p)sup {/3t-/(t)}. 

The function /*(/?) := sup^gjg {(3t—I(t)} is the Legendre-Fenchel dual of the rate function /. Proposition 
U\ says that, under the sufficient condition, the limiting guessing exponent equals (1 + p)/*(p/(l + p)), 
and is thus directly related to the large deviations rate function for information spectrum. This is however 
different from Merhav & Arikan's [T, Th. 2] for memoryless sources which states that the limiting guessing 
exponent is the Legendre-Fenchel dual of the source coding error exponent function. We refer the reader to 
Merhav and Arikan [7, Sec. IV] for further interesting connections between source coding error exponent, 
guessing exponent, and two other exponents related to lossy source coding. 

Let us briefly discuss another approach to verify the existence of Renyi entropy rate (see Proposition 
[6l). With a = 1/(1 + p), we can rewrite \ — a times the Renyi entropy rate in ([3]) as 

(1 - a) lim n-^H^{Pn) 

n— >oo 

= lim n-Mn V exp {-naF„(x")} t/„(x"), (22) 

n— ^oo ' ^ 

where 

F„(x") := (-n-MnP„(x") - (ln|X|)/a) , 

and U is the iid process on with uniform marginal on X. One can then view a E (0, 1) as the inverse 
temperature (when p > 0) of a statistical mechanical system, F„(a;") as the energy of the configuration 
x", and the right side of (|22|) as a scaled version of (i.e., a times) the specific Gibbs free energy of 
the corresponding statistical mechanical system, if the limit exists. This view point is particularly useful 
because the iid process U satisfies a sample path large deviation property. If the information spectrum 
sequence satisfies the continuity conditions in Varadhan |fT9l Th. 3.4], then the limiting specific Gibbs 
free energy exists, and so does the Renyi entropy rate. Our technical report ifTOl treats an example via 
this more general approach. 
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A. Additional results from Large Deviations Theory 

In order to study the examples in Section |Wl we state some additional results on LDP of transformed 
variables. (See [|20l Sec. 4.2]), 1211 Th. 6.12 and 6.14]). 

Proposition 8 (Contraction Principle): Let : n G N) denote a sequence of A'-valued random 
variables where X is a complete separable metric space (Polish space). Let z/„ denote the distribution 
of ^„ for n E N, and let the sequence of distributions : n G N) on A" satisfy the LDP with rate 
function / : X — i- [0,oo]. Let : A" — )■ M be a continuous function. The sequence of distributions of 
(0(^n) : ^ G N) on M also satisfies the LDP with rate function J : M — )• [0, oo] given by 

J{y) = mi{I{x) : X G M, 4>{x) = y}. 

Proposition 9 (Exponential Approximation): Suppose that the sequence of distributions of : n G N) 
satisfies the LDP with rate function / on M. Assume also that the sequence of random variables {(n '■ n E N) 
is superexponentially close to : n G N) in the following sense: for each 6 > 

limsup — lnPr{|,^„ — C„| > 6} = — oo. (23) 

Then the sequence of distributions of : G N) also satisfies the LDP on M with the same rate function 
/. The condition in (1231) is satisfied if 

lim sup|en(w) -CnMI = 0, (24) 
where f2 is the underlying sample space. 

IV. Examples 

We are now ready to apply Proposition 7 and related techniques to various examples. In first five 
examples that follow, our goal is to show that the sufficient condition for the existence of the limiting 
guessing exponent holds, i.e., that the sequence of distributions of the information spectrum satisfies the 
LDP 



A. LDP for information spectrum 

Example 1 (An iid source): This example was first studied by Arikan (3). Recall that an iid source is 
one for which P„(x") = YYi=i Pii^i)^ where Pi is the marginal of Xi. It is then clear that the information 
spectrum can be written as a sample mean of iid random variables 



-n 



-1 



lnP„(X") = -n-i^lnPi(X, 



4 = 1 



It is well-known that the sequence : n E N) of distributions of this sample mean satisfies the LDP with 
rate function given by the Legendre-Fenchel dual of the cumulant of the random variable — InPi(Xi) 
(see for example [18, Th. II.4.1] or [9, eqn. (1.9.66-67)]): 



InE 



exp{/3(-lnA(Xi))}j = In ( Pi(a;)" ) 



'l-a)HJPi] 



The Legendre-Fenchel dual of the rate function is therefore the cumulant itself ( 11181 Th. VI.4.1.e]). An 
application of Proposition |7] yields that (1 + p) times this cumulant, given by pHa{Pi), is the guessing 
exponent. We thus recover Arikan's result tSJ. 
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The rate function / can also be obtained using the contraction principle (Proposition [8]) as follows. This 
method will provide a recipe to obtain the limiting guessing exponent in subsequent examples. Consider 
a mapping that takes x" to its empirical pmf in A^(X). Empirical pmf is then a random variable. The 
distribution of X" induces a pmf on A^(X). It is well-known that the sequence of distributions of these 
empirical pmfs, indexed by n, satisfies the level-2 LDfH with rate function Ip^{-) = D{- || Pi). See for 
example [fT8l Th n.4.3]. Observe that the mapping from the empirical pmf to the information spectrum 
random variable is continuous. We can therefore use the contraction principle to get a formula for / in 
terms of as follows [HSl Th II.5.1]. For any t in M, let 



I.e., 



Then 



e{t) := |g G M(X) : 5^ Q{x) In -i- = t|, 
e{t) = |g G M{X) : H{Q) + D{Q II Pi) = t}. 



/(t)=inf{4f(Q):QGe(t)}. 



Using this, we can write 

/*(/?) = sup|/3t- inf P)(g||Pi)| 

ta^ I Q&9{t) ) 



sup sup \l3t-D{Q II Pi) I 
sup \p{H{Q) + D{Q\\P,))-D{Q\\P,)} 
(l + p)-' sup {pHiQ)-D{Q\\P^)], 



QeM(X) 

thus yielding 

E{p)= sup \pH{Q)~D{Q\\Pi)}. (25) 

This formula extends to more general sources, as is seen in the next few examples. 

Example 2 (Markov source): This example was studied by Malone & Sullivan [4J. Consider an irre- 
ducible Markov chain taking values on X with transition probability matrix tt. Our goal is to verify that 
the sufficient condition holds and to calculate E{p) defined by ([T]) for this source. 

Let A^s(X^) denote the set of stationary pmfs defined by 

Ms (X^) = \^QeM (X^) : 

^ Q{xi,x) = Q(x,X2)Vx G x|. 

Denote the common marginal by q and let 



Q{xi,-)/q{xi), if ^(xi) 7^ 0, 
1/|X|, otherwize. 



^Level-l refers to sequence of distributions (indexed by n) of sample means, level-2 refers to sample histograms, and level-3 to sample 
paths. 
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We may then denote Q = q x rj, where q is the distribution of Xi and rj the conditional distribution of 
X2 given Xi. It is once again well known that the empirical pmf random variable satisfies the level-2 
LDP with rate function li^\Q), given by [1221 

li'\Q) = Div\\n\q) 

■= Yl (iMD{ti{- I xi) II 7r(- I xi)). 

As in Example [U the contraction principle then yields that the sequence of distributions of information 
spectrum satisfies the LDP with rate function / given by 

m=M{I^:^\Q):Qee{t)}. 

where for t in M, e{t) C Ms{^^) is defined by 

e{t) = i Q G M,(X') : ^ Q(xi, X2) In 



Tl{x2\Xi 



By Proposition |6l the limiting guessing exponent exists. Perron-Frobenius theory (Seneta ^23^, Ch. 1], 
see also [24, pp. 60-61]) yields the cumulant directly as lnA(/3), where A(/3) is unique largest eigenvalue 
(Perron-Frobenius eigenvalue) of a matrix formed by raising each element of tt to the power a. (Recall 
that a = 1/(1 + p) and /3 = p/(l + p)). Thus E{p) = (1 + p) In A(/3), and we recover the result of Malone 
& Sullivan [4J. It is useful to note that the steps that led to ( |25] ) hold in the Markov case (with appropriate 
changes to entropy and divergence terms) and we may write 

E{p)= sup {pH{'n\q)-D{7]\\'K\q)], (26) 

where if (r/ | q) is the conditional entropy of X2 given Xi under the joint distribution Q, i.e., 

H{ti I q) ■■= -Yq{x)H{ri{- \ x)). 

Example 3 (Unifilar source): This example was studied by Sundaresan in ^61. A unifilar source is a 
generalization of the Markov source in Example [2l Let X denote the alphabet set as before. In addition, 
let S denote a set of finite states. Fix an initial state sq and let the joint probability of observing (x", s") 
be 

n 

P„(a;", s") = Yl T^ixi, Si I Si-i) 

i=l 

where 7r(xj,Sj | Sj_i) is the joint probability of {xi,Si) given the previous state Sj_i. The dependence of 
Pn on So is understood. Furthermore, assume that 7r(xj,Sj | Sj_i) is such that Sj = 0(sj_i,Xj), where is 
a deterministic function that is one-to-one for each fixed Si-i. Such a source is called a unifilar source. 

Ps,x{si-i, Xi) and completely specify the process: the initial state So is random with distribution that 
of marginal of S in Ps,x, the rest being specified by Px\s{xi I Sj-i) and 0. Example [2] is a unifilar source 
with S = X, (f)(si^i,Xi) = Xi, and Ps,x = q x n where q is the stationary distribution of the Markov 
chain. 

Let A^s(S X X) denote the set of joint measures on the indicated space so that the resulting process 
{Sn : n > 0) is a stationary and irreducible Markov chain. Let a Q G A^s(S x X) be written as Q = q x rj. 
For any t in M, let 



e{t) := (Qe Ms{S X X) : ^ Q{s, x) In 



tt{x 

(s,x) 
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Then the sequence of distributions of information spectrum —n ^ lnP„(X") satisfies the LDP ([9^ eqn. 
(1.9.30)]) with rate function given (once again via contraction principle) by 

I{t) = mf{Diri \\n\q):Qe 9(1)}. 

The limiting exponent therefore exists. Following the same procedure that led to (|25l) in the iid case and 
(|26|) for a Markov source, we get 

E{p) = sup \pH{r] I q) - D{7] \\ vr | g)|, (27) 

where H{r] \ q) and D{ri || tt | g) are analogously defined, and the result of Sundaresan [[6l is recovered. 

Example 4 (A class of stationary sources): Pfister & Sullivan [5] considered a class of stationary sources 
with distribution P E M. (X^) that satisfies two hypotheses HI and H2 of [5, Sec. II-B], which we will 
now describe. 

Let A^^(X^) denote the set of sources that satisfy Qn <^ Pn for all n G N, where Pn and Q„ are 
restrictions of P and Q to n letters. Note that it may be possible that a Q G A^^(X^) is not absolutely 
continuous with respect to P. Also, let A^f (X^) C A^^(X^) denote the subset of stationary sources with 
respect to the shift operator r : X^ — )■ X^ defined by 

(r(x))i = Xi+i, Vz G N. 

Hypothesis HI of Pfister & Sullivan Q assumes that for any neighborhood of a stationary source Q G 
A^f (X^) and any e > 0, there exists an ergodic Q' G Alf (X^) in that neighborhood such that H{Q') > 
H{Q) — e, where H{Q) is the Shannon entropy rate of source Q. Their hypothesis H2 is given by (l30l) 
below. 

Under these hypotheses, Pfister & Sullivan Q proved that E{p) exists, and provided a variational 
characterization analogous to (l27l) . i.e., 

E{p)= sup {pH{Q)-D{Q\\P)], (28) 

Qe>!f(XN) I. J 

where 

IJ(0||P)=lin,„-^0,.(.»)l„|gl. 

En route to this result, Pfister & Sullivan f5\ showed that the sequence of distributions of the empirical 
process satisfies the level-3 LDP for sample paths. We first state this precisely, and then use this as the 
starting point to show the sufficient condition that the information spectrum satisfies the LDP. 

For an X G X^ given by x = {xi,X2, ■ ■ ■), we define x" = (xi, ■ ■ ■ ,x„) as the first n components 
of X in the usual way. Consider a stationary source P whose letters are X = {Xi,X2, ■ ■ ■). Define the 
empirical process of measures 

n-l 
i=0 

This is a measure on X^ that puts mass 1/n on the following strings: x, t{x), r^(x), ■ ■ ■ , r"~^(x). Pfister 
& Sullivan showed that the distributions of the A^(X^)-valued process Tn{X, ■) satisfies the level-3 LDP 
with rate function Ip\-) = D{- \\ P) under hypotheses HI and H2 of their paper ([|51 Prop. 2.2-2.3]). 
Furthermore, 

D{Q II P) = +oo, Qi Alf (X^), (29) 

so that we may restrict D[- || P) to A^f (X^). We next use this to show that the information spectrum 
satisfies the LDP. 
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Hypothesis H2 of Pfister & Sullivan assumes the existence of a continuous mapping ep : — )• 
satisfying 



lim sup 



n-MnP„(x") + ep dT^i 



0, (30) 



where = {x G : P„,(x") > 0}. 

By the compactness of X^, ep is uniformly continuous. Under the weak topology on the complete 
separable metric space A^(X^), the mapping 

0: A^(X^) 

defined by Q J^n ep dQ is a continuous mapping. Hence by the contraction principle, by setting 
X = A^(X^) we get that the sequence of distributions of (0(T„(X, ■) : n e N) satisfies the LDP with 
rate function / given by 

I{t) = inf {D{Q \\P):Qe >lf (X^), (j){Q) = t} , 

where the restriction of the infimum to A^f (X^) follows from (|29|) . Furthermore, given hypothesis H2 and 
(|30l ). an application of the exponential approximation principle (Proposition (9]) indicates that the sequence 
of distributions of the information spectrum too satisfies the LDP with the same rate function /, and we 
have verified that the sufficient condition holds. 

What remains is to calculate this rate function. For this, we return to Pfister & Sullivan's work and use 
D(Q II p) = (j){Q)-H{Q) tSi Prop. 2.1] to write 

lit) = inf {D{Q II P) : H{Q) + D{Q \\ P) = t] . 

Q&MP 

Finally, the Legendre-Fenchel dual of the rate function is computed as in the steps leading to (|25])-(|271), 
yielding (1281) . 

Example 5 (Mixed source): Consider a mixture of two iid sources with letters from X. We may write 



^n(x") = A n R{^i) + (1 - A) n -^(^ 



I ^7 

i=l i=l 



where A G (0,1) with R, S E A^(X) the two marginal pmfs that define the iid components of the 
mixture. It is easy to see that the guessing exponent is the maximum of the guessing exponents for the 
two component sources. We next verify this using Proposition |7l 

The sequence of distributions of the information spectrum satisfies the LDP with rate function given 
as follows (see Han [9, eqn. (1.9.41)]). Define 

01 = f^QeM{X):D{Q\\S)-D{Q\\R)>oY 

02 = \^QeM{X):D{Q\\S)-D{Q\\R)<oy 

and for t G M 

At = 9in\^QeM{X):H{Q) + D{Q\\R)=t^ 
Bt = 92n^^QEMiT):HiQ) + DiQ\\S)=ty 
The rate function (via the contraction principle) is given by 



I{t) = min <j inf D{Q \\ R), inf D{Q \\ S) 
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From Proposition |7] we conclude that the limiting guessing exponent exists. /*(/?) is then 

sup I mini inf D(Q 11 R), inf D(Q 11 S)]\ 

= max < sup sup <(3t — D{Q \\ R) L 
I tm QeAt ^ J 

sup sup \(3t-D{Q \\S)}\ 
= max j sup |/3i/(g) - (1 -/3)D(g II i?) I , 

sup{/3i7(g)-(l-/3)D(g II 
= (l + p)-imax|sup|pi/(g) -D(g II i?)}, 

sup |piy(g) - D(g || s)} 

= (l + p)-imax{pif„(i?),piJ,(5)}, 
E(p) = max|pi7,(i?),pif,(5)}. 



yielding 



V. Proofs 

We now prove Propositions [6] and Ul 

A. Proof of Proposition |6| 

From Theorem [5] it is sufficient to show that the limit in (|2T1) for Campbell's coding problem exists if 
and only if the Renyi entropy rate exists, with the former p times the latter. 

Fix n. In the rest of the proof, we use the notation Ep„ [•] for expectation with respect to distribution P„. 
The length function can be thought of as a bounded (continuous) function from X" to M and therefore our 
interest is in the logarithm of its moment generating function of p, the cumulant. The cumulant associated 
with a bounded continuous function (here L„) has a variational characterization ll25l Prop. 1.4.2] as the 
following Legendre-Fenchel dual of the KuUback-Leibler divergence, i.e.. 



InE, 



exp{(pln2)L„(X")} 

sup {(pln2)EQjL„(X")] - D(g„ || P„)|. 



(31) 
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Taking infimum on both sides over all length functions, we arrive at the following chain of inequalities: 



inf InEp 

T ' 



exp{(pln2)L„(X")} 
= inf sup |EQj(pln2)L„(X")]-D(Q„ II P„) 

= sup inf {EQ j(pln2)L„(X")] - D{Qn \\ P„)| 

+9(1) 

= sup |pif„(Q„) - D{Q^ II P„)| + 0(1) 

Qn GA4(X") J 

= pi^i ra + e(i). 

i+p 

Equation (l33l) follows because (i) the mapping 

(L„,Q„) ^EQj(pln2)L„(X")] -D(g„ || 



(32) 



(33) 
(34) 

(35) 



is a concave function of Qn, (ii) for fixed Qn and for any two length functions Ln^ and Ln'', for any 
A G [0, 1], the function 

Ln = [APW + (1 - A)Li^)l 

is also a length function and 

EqJL„] = AEqJLW] + (1 - A)EQjLi2)] + 6(1); 

(iii) A^(X") is compact and convex, and therefore the infimum and supremum may be interchanged upon 
an application of a version of Ky Fan's minimax result [26] . This yields a compression problem, the 
infimum over L„ of expected lengths with respect to a distribution Qn- The answer is the well-known 
Shannon entropy H{Qn) to within In 2 nats, and (|34l) follows. Lastly, (l35l) is a well-known identity which 
may also be obtained directly by writing the supremum term in (|34|) as 



(2) 



1 + p) sup {Eq„ 

«-.GA4{X") 



P 



lnP„(X" 

/^(Qn II Pn)] 



and then applying (|3T1) with — (p/(l + p) lnP„(X")) in place of (pln2)L„(X") to get the scaled Renyi 
entropy. 

Normalize both (l32l) and (l35l) by n and let n — t- oo to deduce that (|2T1) exists if and only if the limiting 
normalized Renyi entropy rate exists. This concludes the proof. 



B. Proof of Proposition [7| 

This is a straightforward application of Varadhan's theorem f[9\ on asymptotics of integrals. Recall that 
Un is the distribution of the information spectrum n^^ In Pn(X"). Define F{t) = /3t. Since the (z/„ : n E N) 
sequence satisfies the LDP with rate function /, Varadhan's theorem (see Ellis [ITSl Th. Il.V.l.b]) states 
that if ^ 

lim limsup — In / exp{n/3t} dvnit) = —oc (36) 

A/^oo n Jt>M. 

then the limit 

lim - In / exp{n/3t} Un{dt) = sup {(3t - I{t)} (37) 



15 



holds. The integral on the left side in (|37] ) can be simplified by defining the finite cardinality set 

Ar, = {-n-^ lnP„(x") : Vx" G X"} C M 

and by observing that 

/ exp{n/3t} Unidt) 
Jr 

= J2 exp{n/3t} Yl ^"(^") 

teAn x":Pn{x")=exp{-nt} 

X" 

= ^ P„(x")TT^ = exp {/3ifi/(i+,)(P„)} • 

X" 

Take logarithms, normalize by n, take limits, and apply (|37l) to get the desired result. It therefore remains 
to prove (|361 ). 

The event {t > occurs if and only if 



P„(x") < exp 
The integral in (|36l ) can therefore be written as 



J2 E exp{n/3t}P„(x") 



teA„,t>^ x":P„(x")=exp{-nt} 



5^ P„(a;")^ 



i":P„(a;")<exp{^^} 

./3(r+p) 



The sequence in n on the left side of (l36l) is then 
a constant sequence. Take the limit as M — > oo to verify (l36l) . This concludes the proof. 



VI. Conclusion 

We first showed that the problem of finding the limiting guessing exponent is equal to that of finding 
the limiting compression exponent under exponential costs (Campbell's coding problem). We then saw 
that the latter limit exists if the sequence of distributions of the information spectrum satisfies the LDP 
(sufficient condition). The limiting exponent was the Legendre-Fenchel dual of the rate function, scaled 
by an appropriate constant. It turned out to be the limit of the normalized cumulant of the information 
spectrum random variable. While some of these facts can be gleaned from the works of Pfister & Sullivan 
IQ and Merhav & Arikan [7], our work sheds light on the key role played by the information spectrum. It 
will be of interest to find a rich class of sources beyond those listed in this paper for which the information 
spectrum satisfies the LDP. 

Results on guessing with key-rate constraints for a general source are provided using the above 
information spectrum approach in ETl . 



16 



References 

[1] N. Merhav and E. Arikan, "The Shannon cipher system with a guessing wiretapper," IEEE Trans. Inf. Theory, vol. 45, no. 6, pp. 
1860-1866, Sep. 1999. 

[2] J. L. Massey, "Guessing and entropy," in Proc. 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, Jun. 
1994, p. 204. 

[3] E. Arikan, "An inequality on guessing and its application to sequential decoding," IEEE Trans. Inf. Theory, vol. 42, pp. 99-105, Jan. 
1996. 

[4] D. Malone and W. G. Sullivan, "Guesswork and entropy," IEEE Trans. Inf. Theory, vol. 50, no. 4, pp. 525-526, Mar. 2004. 
[5] E. Pfister and W. G. Sullivan, "Renyi entropy, guesswork moments, and large deviations," IEEE Trans. Inf. Theory, vol. 50, no. 11, 
pp. 2794-2800, Nov. 2004. 

[6] R. Sundaresan, "Guessing based on length functions," in Proceedings of the Conference on Managing Complexity in a Distributed 
World, MCDES, Bangalore, India, May 2008; also available as DRDO-HSc Programme in Mathematical Engineering Technical Report 
No. TR-PME-2007-02, Feb. 2007. 

http://pal.ece.iisc.ernet.in/PAM/tech_rep07/TR-PME-2007-02.pdf. 
[7] E. Arikan and N. Merhav, "Guessing subject to distortion," IEEE Trans. Inf Theory, vol. 44, pp. 1041-1056, May 1998. 
[8] L. L. Campbell, "A coding theorem and Renyi's entropy," Information and Control, vol. 8, pp. 423^29, 1965. 
[9] T. S. Han, Information-Spectrum Methods in Information Theory. Springer- Verlag, 2003. 
[10] M. K. Hanawal and R. Sundaresan, "Guessing revisited: A large deviations approach," DRDO-IISc Programme in Mathematical 

Engineering Technical Report No. TR-PME-2008-08, Dec, 2008, available at http://pal.ece.iisc.emet.in/PAM/tech_rep08/TR-PME- 

2008-08.pdf 

[11] K. R. Parthasarathy, Coding Theorems of Classical and Quantum Information Theory. TRIM no. 45, Hindustan Book Agency, 2007. 
[12] K. Marton and P. C. Shields, "The positive-divergence and blowing-up properties," Israel J. Math., vol. 86, pp. 331-348, 1994. 
[13] P. C. Shields, "The interactions between ergodic theory and information theory," IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2079-2093, 
Oct. 1998. 

[14] M. J. Weinberger, J. Ziv, and A. Lempel, "On the optimal asymptotic performance of universal ordering and of discrimination of 

individual sequences," IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 380-385, Mar. 1992. 
[15] A.D.Wyner, "An upper bound on the entropy series," Information and Control, vol. 20(2), pp. 176-181, Mar. 1972. 
[16] N. Merhav, "Universal coding with minimum probability of codeword length overflow," IEEE Trans. Inf. Theory, vol. 37, no. 3, pp. 

556 - 563, May 1991. 

[17] R. Sundaresan, "Guessing under source uncertainty," IEEE Trans. Inf. Theory, vol. 53, no. 1, pp. 269-287, Jan. 2007. 
[18] R. S. Ellis, Entropy, Large Deviations, and Statistical Mechanics, ser. Grundlehren der mathematischen Wissenschaften. New York: 
Springer- Verlag, 1985, vol. 271. 

[19] S. R. S. Varadhan, "Asymptotic probabilities and differential equations," Comm. Pure Appl. Math., vol. 19, pp. 261-286, 1966. 

[20] A. Dembo and O. Zeitouni, Large Deviation Techniques and Applications, 2nd ed. New York: Springer- Verlag, 1998. 

[21] R. S. Ellis, "The theory of large deviations and applications to statistical mechanics," Oct 2006, lectures for the International Seminar 

on Extreme Events in Complex Dynamics, Dresden, Germany. 
[22] S. Natarajan, "Large deviations, hypotheses testing, and source coding for finite Markov chains," IEEE Trans. Inf. Theory, vol. 31, 
no. 3, pp. 360-365, May 1985. 

[23] E. Seneta, Non-negative Matrices: An Introduction to Theory and Applications. London: George Allen & Unwin Ltd., 1973. 
[24] F. den Hollander, Large Deviations. Rhode Island: American Mathematical Society, 2003. 

[25] P. Dupuis and R.S.Ellis, A Weak Convergence Approach to the Theory of Large Deviations. New York: John Wiley & Sons, 1997. 

[26] I. Joo and L. L. Stacho, "A note on Ky Fan's minimax theorem," Acta Math. Acad. Sci. Hungar, vol. 39, pp. 401^07, 1982. 

[27] M. K. Hanawal and R. Sundaresan, "The Shannon cipher system with a guessing wiretapper: General sources," DRDO-IISc Programme 

in Mathematical Engineering Technical Report No. TR-PME-2009-04, Jan., 2009, available at 

http://pal.ece.iisc.emet.in/PAM/tech_rep09/TR-PME-2009-04.pdf. 



